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INTRODUCTION 


Complex  man-machine  systems  are  often  limited  by  comparatively  slow  and 
inefficient  data  entry  methods.  The  recent  proliferation  of  computer 
peripherals,  such  as  trackballs,  'Vnice",  touch  screen  displays,  digitizing 
panels,  and  voice  command  systems  attests  to  this  problem.  The  most 
complex  solution,  voice  recognition,  is  also  the  most  attractive  because 
speech  is  the  richest  and  most  natural  way  for  people  to  communicate. 
Machines  are  available  that  decipher  phonetic  patterns  into  text  with  over 
90%  accuracy,  but  in  critical  situations,  even  a  small  number  of  errors  can 
significantly  degrade  performance.  It  is  therefore  of  interest  to  identify 
words  that  are  consistently  unintelligible  or  mistaken  for  other  words,  and 
avoid  their  use  in  voice  conmand  systems.  The  Data  Entry  Vocabularies  in  a 
C3  Environment  study  (DEVICE)  was  performed  during  December  1983-January 
1984  and  used  a  vocabulary  from  a  prior  AFAMRL  experiment  to  investigate 
error  rates.  The  results  are  presented  here. 


BACKGROUND 

During  the  SIMCOPE-1  study,  performed  by  Dr.  Peter  Crane  (Univ.  of 
Pittsburgh)  and  Capt  Dave  Leupp  (AFAMRL/HEC)  ,  which  incorporated  voice 
command  as  one  of  two  data  entry  methods  in  a  simulated  missile  warning 
crewstation,  several  "problem"  words  emerged.  A  pilot  study  to  find  word 
recognition  error  rates  for  the  entire  SIMCOPE-1  vocabulary  produced  more 
candidate  words.  Replacement  words  with  meanings  similar  to  the  problem 
words  were  chosen  for  phonetic  dissimilarity  and  combined  with  the  original 
list  of  95  words,  giving  a  total  of  125  words. 


EXPERIMENTAL  DESIGN 

The  experimental  vocabulary  contains  three  groups  of  words  (Table  1),  the 
control  group  (words  which  caused  few  errors,  totalling  65),  the  original 
group  of  problem  words  (totalling  30),  and  the  replacement  group  (totalling 
30).  Two  vocabularies  were  used:  control  and  original  (VI),  and  control 
and  replacement  (V2).  Ten  subjects  from  a  subject  pool  were  trained  on  the 
voice  recognition  equipment.  All  125  words  were  repeated  ten  times  to 
allow  the  machine  to  "learn"  the  pronunciation  of  the  word,  and  then  this 
information  was  stored  on  tape.  The  system  used  was  speaker-dependent, 
requiring  a  different  tape  for  each  subject.  Training  was  supervised  by 
the  experimenter,  who  coached  the  subjects  to  avoid  monotonous  pronuncia¬ 
tion.  (Since  word  inflection  is  invariably  different  during  training  than 
during  use,  the  processor,  which  averages  the  training  pronunciations,  will 
recognize  words  better  if  various  inflections  are  used  during  training.) 

During  a  session,  a  subject  was  seated  in  a  soundproof  room  in  front  of  a 
terminal,  wearing  a  head-mounted  microphone.  Microphone  placement  has  been 
shown  to  be  an  important  factor  in  recognition,  so  placement  was 
supervised  during  training  and  trials.  The  trial  was  initiated  by  the 
subject  and  consisted  of  three  randomly  ordered  iterations  of  VI  or  V2, 
with  each  word  flashing  onto  the  screen  at  random  intervals  (1-3  sec)  and 
remaining  on  the  screen  for  .35  sec.  The  subject  attempted  to  read  the 
word  into  the  microphone  before  the  next  word  appeared.  Time  stress  was 
present  to  simulate  a  more  realistic  setting,  and  to  prevent  the  subject 


from  lapsing  into  a  monotone  (yet  few  subjects  skipped  words  because  of 
it).  Subjects  could  interrupt  the  experiment  at  any  time  and  had  two 
mandatory  breaks  per  trial.  Sessions  lasted  approximately  25  minutes, 
consisting  of  two  trials  separated  by  a  five-minute  break,  for  a  total  of 
five  breaks.  Each  session  included  one  trial  of  VI  and  one  of  V2  in 
varying  order.  The  session  order  for  subjects  is  shown  in  Table  2.  Audio 
tapes  of  the  sessions  were  made. 

Equipment  used  included  a  Threshold  600  voice  recognizing  unit,  a  Shure 
SM1QA  head-mounted  microphone  worn  by  the  subjects,  a  DEC  PDP  11/40  mini¬ 
computer  that  presented  trials  and  collected  data,  and  a  Tascam  44  audio 
tape  recorder. 


RESULTS 

Two  types  of  errors  were  recorded,  misrecognition,  or  confusion  with 
another  word,  and  nonrecognition,  or  failure  of  the  system  to  match  the 
word  with  the  training  pattern.  Nonrecognitions  can  be  frustrating  to  a 
user  (especially  with  the  usual  audible  feedback),  but  misrecognitions  are 
more  dangerous  to  system  performance  because  they  can  go  undetected. 

The  error  rates  for  each  group  are  shown  in  Table  3.  It  is  clear  that  the 
intuitive  criteria  used  to  select  the  replacement  group  did  not  result  in  a 
superior  vocabulary.  In  fact,  the  replacement  words  had  a  significantly 
higher  misrecognition  rate  (x2  =  6.00,  p  <  .05).  One  possible  reason  might 
be  that  many  of  the  subjects,  drawn  from  a  limited  pool,  had  unavoidedly 
been  subjects  for  the  SIMCOPE-1  study  and  were  more  familiar  with  the 
original  words  (VI).  Table  4  shows  the  word  replacement  pairs,  each  having 
one  original  and  one  replacement  word,  for  which  error  rates  differed 
significantly.  The  overall  error  rate  of  the  best  words  from  each  pair  was 
6.4%,  compared  with  18.5%  for  the  worst  words. 

A  closer  examination  of  Table  4  and  the  least  recognized  words  (Table  5) 
allows  some  hypotheses  to  be  made  based  on  phonetic  qualities  of  "problem" 
words: 

1.  Monosyllabic  words  are  less  often  recognized  than  polysyllabic 
words. 

2.  Words  ending  with  T  or  containing  a  T  which  is  slurred  or  absent 
in  normal  speech  (eight,  west,  delta)  are  also  poorly  recognized. 

Table  6  illustrates  the  breakdown  of  the  vocabulary  into  these  two  groups. 
The  actual  number  of  high-error  words  in  a  phonetic  group  was  compared  with 
the  expected  number,  or  the  number  of  words  in  a  group  multiplied  by  the 
overall  probability  of  error  over  10%  (39/125)  (Table  7).  Vocabulary  words 
that  belonged  to  either  of  these  groups  were  significantly  more  likely  to 
have  an  overall  error  rate  greater  than  10%.  Words  that  belonged  to  both 
groups  made  too  small  a  sample  to  have  a  significant  difference  in  rate. 
It  is  clear  that  these  word  groups  should  be  avoided  by  system  designers 
especially  because  words  belonging  to  neither  group  were  highly  unlikely  to 
have  an  error  rate  over  10%. 


TABLE  1.  DEVICE  Word  List 


CONTROL 

YES 

INDISTINCT 

EVENT  MESSAGES 

NO 

CDC 

SYSTEM  REPORTS 

NORTH 

CWC 

TELEPHONE  DIRECTORY 

SOUTH 

BSS 

DETAIL  MAP 

CENTRAL 

KEY  NORTH 

EVENT  TIMELINE 

INN 

NORTH  CITY 

INTELLIGENCE  REPORTS 

OUTT 

TOLL  CITY 

OUTPUT  FORMAT 

WEST 

HAYES 

REFERENCE  DIRECTORY 

SUSPECTED 

CLEAR 

ADS1 

ZERO 

ASSIGN 

ADS2 

ONE 

BACKSTEP 

CLEAR  ENTRY 

TWO 

AUTO 

ADS 

THREE 

EDIT 

KNOWN  SITES 

FOUR 

WHITE  SANDS 

MILITARY  INSTALLATIONS 

FIVE 

PINE  GROVE 

INDUSTRIAL  CENTERS 

SIX 

SOUTHRICH 

ALL  EVENTS 

SEVEN 

HOSTILE 

ALL 

EIGHT 

TEST 

ADS  GSF 

NINE 

REASSIGN 

BSS  GSF 

ENTER 

SHOW 

OCEAN  CITY 

TYPE  I 

SUPPRESS 

VECTOR 

TYPE  II 

SITUATION  MAP 

ORIGINAL 

KNOWN 

UP  ARROW 

E7 

UNKNOWN 

DOWN  ARROW 

E8 

FINISH 

LEFT  ARROW 

E9 

BURF 

RIGHT  ARROW 

E10 

RIVERTON 

El 

NOT  CLEAR 

LIVINGSTON 

E2 

LOCATE 

DELTA 

E3 

LOG 

SOUTHERN 

E4 

SUSPECT  SITES 

SEND 

E5 

DELETE 

ACKNOWLEDGE 

E6 

FAN 

REPLACEMENT 

IDENTIFIED 

SCROLL  UP 

EVENT  7 

UNIDENTIFIED 

SCROLL  DOWN 

EVENT  8 

OVER 

SCROLL  LEFT 

EVENT  9 

BRF 

SCROLL  RIGHT 

EVENT  10 

ROSEDALE 

EVENT  1 

UNRESOLVED 

LAKEVIEW 

EVENT  2 

COORDINATES 

DAIRYLAND 

EVENT  3 

MESSAGE  LOG 

MOUNTAIN 

EVENT  4 

POSSIBLE  SITES 

SUBMIT 

EVENT  5 

REMOVE 

OK 

EVENT  6 

RANGE 

4 


* 


TABLE  2.  Experimental  Design 


TABLE  3.  Error  Rates  of  Word  Groups 


Overall  Error  Rate 
Control  Error  Rate 
Original  Error  Rate 
Replacement  Error  Rate 

Misrecogni tions 
Control 
Original 
Replacement 

Nonrecognitions 

Control 

Original 

Replacement 


=  8.75% 
=  8.72% 
=  8.67% 
=  8.90% 

=  2.00% 
=  1.74% 
=  1.93% 
=  2.63% 

=  6.75% 
=  6.98% 
*  6.74% 
=  6.27% 


TABLE  4.  Word  Pairs  With  Significantly  Different  Error  Rates 
(Total  Number  of  Trials  Per  Word  Group  =  180) 
(P(X2),  P  <  .05) 


OVERALL  ERRORS 


ORIGINAL 

#  ERRORS 

REPLACEMENT 

#  ERRORS 

BURF 

53 

BRF 

10 

DELTA 

34 

DAIRYLAND 

16 

SOUTHERN 

21 

MOUNTAIN 

54 

DOWN  ARROW 

11 

SCROLL  DOWN 

24 

E4 

13 

EVENT  4 

27 

E9 

6 

EVENT  9 

15 

E10 

5 

EVENT  10 

16 

DELETE 

44 

REMOVE 

17 

FAN 

33 

RANGE 

7 

KNOWN 

5 

MISRECOGNITION  ERRORS 

IDENTIFIED 

17 

UNKNOWN 

1 

UNIDENTIFIED 

14 

FINISH 

9 

OVER 

0 

SEND 

7 

SUBMIT 

0 

LEFT  ARROW 

7 

SCROLL  LEFT 

1 

E4 

0 

EVENT  4 

11 

E5 

3 

EVENT  5 

10 

E8 

13 

EVENT  8 

29 

E9 

2 

EVENT  9 

10 

E10 

1 

EVENT  10 

8 

LOCATE 

0 

COORDINATES 

4 

SUSPECT  SITES 

5 

POSSIBLE  SITES 

0 

DELETE 

6 

REMOVE 

0 

FAN 

6 

RANGE 

0 

NON RECOGNITION  ERRORS 


KNOWN 

32 

IDENTIFIED 

13 

UNKNOWN 

24 

UNIDENTIFIED 

6 

FINISH 

9 

OVER 

25 

BURF 

53 

BRF 

8 

DELTA 

33 

DAIRYLAND 

15 

SOUTHERN 

11 

MOUNTAIN 

45 

DOWN  ARROW 

9 

SCROLL  DOWN 

24 

DELETE 

38 

REMOVE 

17 

FAN 

19 

RANGE 

7 

(best  word  underlined) 


TABLE  5.  Words  With  Overall  Error  Rates  Greather  Than  10% 


Overall  Error  Rate  Words 

11%  INDISTINCT,  SOUTHRICH,  SHOW,  INTELLIGENCE  REPORTS 

ADS  GSF,  UNIDENTIFIED 

12%  SOUTHERN,  LEFT  ARROW,  SUPPRESS 

13%  EDIT,  SCROLL  DOWN 

14%  NO,  CENTRAL,  WEST,  UNKNOWN,  TWO,  NINE,  E8,  OVER, 

SCROLL  UP,  SCROLL  LEFT 

15%  OUTPUT  FORMAT,  EVENT  4 

17%  SEVEN,  IDENTIFIED 

18%  FAN 

19%  YES,  DELTA 

20%  FOUR,  EVENT  8 

21%  NORTH,  SOUTH,  OUTT,  KNOWN,  EIGHT 

22%  INN 

24%  DELETE 

29%  BURF 

30%  MOUNTAIN 


r 


TOTAL  =  39  Words 
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TABLE  6 


Phonetic  Groups  in  Vocabulary 


FINAL  OR 

MONOSYLLABLES  VESTIGAL  T 


BOTH 


*YES 

*NO 

★NORTH 

★SOUTH 

*INN 

★KNOWN 

ONE 

★TWO 

THREE 

★FOUR 

FIVE 

SIX 

★NINE 

★BURF 

CLEAR 

SEND 

★SHOW 

LOG 

ALL 

★FAN 

RANGE 


TOTAL  =  21 


ENTER  *OUTT 

★INDISTINCT  *WEST 

RIVERTON  *EIGHT 

★DELTA  TEST 

RIGHT  ARROW 

AUTO  TOTAL 

★EDIT 
*E8 

LOCATE 


★OUTPUT  FORMAT 
SUSPECT  SITES 
★DELETE 
★IDENTIFIED 
★UNIDENTIFIED 
★SCROLL  LEFT 
SCROLL  RIGHT 
EVENT  1 
★EVENT  8 
★MOUNTAIN 
SUBMIT 


TOTAL  =  20 


★over  10%  error  rate 
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TABLE  7.  Corrparison  of  Error  Rates  of  Phonetic  Groups 


Word  Group 

Total  # 

#  With 

<  10%  Errors 

#  With 

>  10%  Errors 

id 

Sig. 

Act. 

Exp. 

Act. 

Exp. 

Monosyllables 

21 

9 

14.4 

12 

6.55 

6.44 

<.05 

Final  or 
Vestigial  T 

20 

9 

13.8 

11 

6.44 

5.28 

<.05 

Both 

4 

1 

2.75 

3 

1.25 

3.56 

- 

Neither 

80 

67 

55.0 

13 

25.0 

8.38 

<.005 

TOTAL 

125 

86 

39 

. 
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Other  effects  of  phonetic  structure  on  word  recognition  probably  exist 
which  were  not  detectable  in  this  vocabulary,  lacking  as  it  is  in  size  and 
diversity.  Further  research  into  the  phonetic  factors  that  affect  machine 
intelligibility  is  needed. 
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